Deducing subnanometer cluster size and shape distributions of heterogeneous supported catalysts

Infrared (IR) spectra of adsorbate vibrational modes are sensitive to adsorbate/metal interactions, accurate, and easily obtainable in-situ or operando. While they are the gold standards for characterizing single-crystals and large nanoparticles, analogous spectra for highly dispersed heterogeneous catalysts consisting of single-atoms and ultra-small clusters are lacking. Here, we combine data-based approaches with physics-driven surrogate models to generate synthetic IR spectra from first-principles. We bypass the vast combinatorial space of clusters by determining viable, low-energy structures using machine-learned Hamiltonians, genetic algorithm optimization, and grand canonical Monte Carlo calculations. We obtain first-principles vibrations on this tractable ensemble and generate single-cluster primary spectra analogous to pure component gas-phase IR spectra. With such spectra as standards, we predict cluster size distributions from computational and experimental data, demonstrated in the case of CO adsorption on Pd/CeO2(111) catalysts, and quantify uncertainty using Bayesian Inference. We discuss extensions for characterizing complex materials towards closing the materials gap.


Supplementary Note 1: Low-Energy Ensemble via Delaunay Triangulation
To enumerate all possible events by finding all available physical adsorption sites, we utilize an algorithm called Delaunay Triangulation (DT) 1 . DT takes a given set of points P in space and generates a triangular mesh such that no point in P lies inside the circumcircle of any ndimensional triangle, DT(P). We allow the set of points P to be the atomic positions of all Pd atoms in a given cluster. All atoms that are located on a triangle whose vertices are not a shared face of multiple tetrahedrons are considered surface atoms. There are many different algorithms with varying run time scalings reported; we choose the one of Lee et al. 2 . Vertices, edges, and centroids of the surface triangles correspond to atop, bridge, and three-fold sites, respectively.
Adsorption vectors for atop and bridge sites are computed to maximize the distance away from all atoms, while adsorption vectors for three-fold sites correspond to the normal vector of the plane spanned by the surface triangle pointing away from the cluster. This methodology of enumerating all physical adsorption sites allows for more accurate initial structures for DFT calculations. The structures are ranked from lowest to highest in energy for a given cluster size to determine the low-energy ensemble for direct DFT frequency calculations.

Supplementary Note 2: Boltzmann Probabilities for Pd5-Pd19
We choose a 95% integrated probability density as a cutoff, as modern FTIR spectrometers with a resolution of 2 cm -1 typically have a signal-to-noise ratio (SNR) in the order 400 at the frequencies of the highest observed intensity peaks (i.e., C-O stretch region of 1600-2000 cm -1 ).
The SNR is defined as: where P is the average power measured at equivalent frequencies. Alternatively, we can write the ratio as where A is the root mean square amplitude. For an SNR of 400, the ratio of the amplitudes is 20.
We expect 95% of the observed signal to be attributed to the system and 5% to noise. As a result, cluster configurations with Boltzmann probabilities outside the 95% bin contribute intensities indistinguishable from noise. Here, we show the ensemble and Boltzmann probability densities for Pd5-Pd19 at 323 K with saturated CO coverage, as well as the corresponding 95% integrated probability density ( Figure S1). The Boltzmann probability for a given configuration, , is computed as follows from statistical mechanics 3 : where ∆ is the free energy of the adsorbate cluster configuration i, and M and j are the summation over all adsorbate cluster configurations. The ensemble probability and Boltzmann probability density curves are fit using a Gaussian kernel-density and an exponential distribution, respectively. Each point on the curves represents discrete configurations of CO and Pd clusters for a given isomer size. We perform an analogous Boltzmann probability analysis on bare Pd clusters to understand the effect on CO on the number of thermodynamically accessible states ( Figure S2). Rather than utilizing the free energy as in Equation (S3) for the calculation, we use the electronic energies from our Hamiltonian, as vibrational contributions to the enthalpy are negligible without adsorbates.

Primary Spectra
In the main text, we demonstrate the effect of CO coverage on the primary spectra of clusters, and how it is advantageous to operate at saturated coverage regimes for more discernable spectroscopic signatures. At differential coverages, a single high-intensity CO peak exists for each primary spectra; at this coverage, we find that the frequency of the CO peak is highly correlated with the adsorption site-type (i.e., the number of metal atoms the adsorbate is coordinated with) for all cluster sizes Pd1-Pd20. Figure S3 shows the distributions of CO stretch frequencies versus site type for the aggregated dataset of all cluster sizes. For all cluster sizes, the adsorption site type (i.e., atop, bridge, hollow) almost entirely determines the major spectroscopic peak of our primary spectra. This is mathematically problematic for deconvolution purposes as it leads to an ill-posed problem for traditional frequentist-type statistics, and broad and uninformative posterior distributions for Bayesian-type statistics. This further motivates our decision to operate at the saturated CO coverage regime for the purposes of this work.
In the main text, we have shown examples that clusters of varying sizes at the saturated CO coverage regime. However, we have found that primary spectra of isomers (i.e., of the same cluster size) also exhibit unique spectroscopic signatures and primary spectra, with differences of the same order-of-magnitude as spectra of different cluster sizes. Figure S4 shows an example of the differences between the primary spectra of 4 different isomers of Pd10. Due to the large differences in CO adsorption configurations, it is easy to distinguish between isomers based on their spectroscopic signature.

Supplementary Note 4: Accounting for DFT Error via Linear Scaling Factors
It is customary to apply scaling factors to correct for systematic errors of first-principles computations when compared to experimental results due to errors of the functional used as well as the harmonic approximation of the potential energy surface. In addition, computational modeling of adsorption systems assumes an infinite metal mass approximation. While this approximation reduces the computational cost, it causes calculated frequencies of normal vibration modes driven by adsorbate/metal interactions to be systematically lower than those found experimentally 4 . As a result, we develop linear scaling factors, α, from CO adsorption spectra on Pd single crystals to correct DFT-calculated frequencies.
Here, ν EXP is the experimental frequency and ν DFT is the DFT-computed frequency. The experimental frequencies chosen are the frequencies that correspond to the maximum intensity of the experimental spectrum within the C-O stretch frequency range of 1600-2200 cm -1 . As the exact overlayer and metal structure are known, we compute their associated C-O stretch frequencies from DFT and calculate a linear scaling factor (and its relative uncertainty) using equations provided by NIST 5 . The details of the DFT calculations are described in the Methods section.
Applying the equations outlined in the methods with the data in Table S1 results in a scaling factor of 1.017 with a standard deviation of 0.0004. Application of linear scaling factor shifts the DFT computed spectra to better match experimental spectra. We show an example of how scaling factors applied to DFT-computed spectra better match experiments in Figure S6.
However, this linear scaling factor only directly applies to adsorption frequencies associated with extended metal surfaces and may not represent the true errors associated with supported single atoms and clusters. We utilize this distribution as a prior distribution as our best estimate for the DFT errors for spectra deconvolution. The "true" linear scaling factor is fit during the Bayesian Inference procedure and utilizes the bounds for the scaling factor on well-defined Pd crystals as a regularization mechanism to prevent overfitting.

Supplementary Note 5: Error in Bayesian Spectra Deconvolution
As described in the Methods section, we model the infrared spectrum, y ⃗ , as the vector sum of discretized primary spectra, , weighted by their relative concentration, , plus some noise, ɛ: The noise term, ɛ, is modeled as a multivariate normal distribution with expected value of 0 and 2 . We claim that is equivalent to the reciprocal of the amplitude ratio. Beginning from Equation (S1), we can rewrite the SNR as 6 : where E refers to the expected value, S is the observed signal, and N is random noise. Both the signal and noise must be measured at equivalent points in the system, and in our case, identical frequencies.
If the noise has an expected value of 0, i.e., E[ ] = 0, we can rewrite the denominator as: where V is the variance. By substituting and rearranging at each frequency, we observe the following: The error parameter, σ, is equivalent to the reciprocal of the amplitude ratio.

Supplementary Note 6: Markov Chain Monte Carlo with Stan
We utilize the open-source Bayesian inference software PyStan, a Python interface to Stan, to parameters are needed that affect the final spectral shape and intensities: full-width halfmaximum (FWHM), fraction Lorentzian (fL), and linear scaling factor (α). The shape parameters, FWHM and fL, are associated with spectral broadening that occurs in experimental spectra. Spectral broadening is mainly associated with lifetime broadening, thermal broadening, and pressure broadening 9 . No physical models currently exist to model these phenomena for metal adsorbates, so we choose a diffusive prior to describe them. There have been efforts to study intrinsic peak widths of adsorbed CO through single crystals with well-defined overlayers as the obtained spectra have minimal overlapping peaks due to uniformity in adsorbates, but the applicability to clusters remains unknown 10 . The linear scaling factor, α, corrects for errors in computed frequencies due to the harmonic approximation and utilizes a weakly informative prior derived in previous sections of the Supplementary Information using data collected from UHV experiments. Finally, for the N cluster fraction parameters, we use a diffuse prior, as no prior information is known about the system at hand. Information about cluster size fractions from other spectroscopy or microscopy-based characterization used in tandem can be leveraged to hypothesize a weakly informative or informative prior.

Supplementary Note 8: Performance Benchmarking
In the main text, we demonstrated the ability of our Bayesian deconvolution method in predicting subnanometer cluster size distributions directly from both synthetic and experimental spectra. Here, we show the efficacy of our methodology on 100 synthetic spectra with varying artificial Gaussian noise constructed following the Methods section and Equation S5. Both the relative concentrations of each species from Pd1-Pd20 as well as the extent of Gaussian noise for each spectrum are randomized (these values are noted as ground truths for benchmarking purposes). Noise is simulated with signal-to-noise ratios ranging from infinity (e.g., infinitesimally small noise, the limit as σ approaches 0) to 25 (e.g., the lowest SNR of FTIR receivers reported in the literature, the limit as σ approaches 0.2) by uniformly sampling values of 0<σ <0.20. Note both bounds of signal-to-noise ratios are unrealistic to expect for observed experimental spectra obtained from modern-day FTIR receivers and purely serve as limits for benchmarking. We show a parity plot of the predicted cluster size fractions, binned in three categories (single-atoms for Pd1, 2D clusters for Pd2-Pd5, and 3D clusters for Pd6-Pd20) as in the main text, as a function of the true value of σ for each spectrum in Figure S5. As the Bayesian deconvolution of each spectrum yields a distribution of sampled values, we utilize the maximum a posteriori (MAP) estimation, or the mode of the distribution, as point estimates for the parity plot. The MAP estimate and mean of the distributions were found to be essentially identical over 100 spectra. For 100 spectra, the deconvolution procedure yields a mean absolute error of 0.049 for the predicted cluster fractions over 300 data points (as each spectrum yields 3 predictions) and the entirety of the range of simulated noise values. Surprisingly, the prediction error is not correlated with the amount of the simulated noise for the SNR values studied, corresponding to 0<σ <0.20. This is a good indication that the model is robust enough to avoid overfitting spectra to noise. More importantly, however, is that for all 100 spectra, the true cluster fraction lies within the 95% credible interval of the distribution.  Each point along the probability density curves represents a discrete Pdn configuration. The ensemble probability density assumes each discrete state is equally probable, whereas the Boltzmann probability density weights each discrete state by its respective Boltzmann factor.
The shaded red region represents the integrated 95% probability density. At low temperatures, relatively few discrete states are energetically accessible and dominate the ensemble of isomers compared to high temperatures. The number of discrete states that meet the 95% cutoff criteria range from 10 to 40 states over the range of Pd5-Pd20. Figure S3. CO stretch frequencies at differential coverage versus site type. Each violinplot shows the distribution of frequencies of the aggregated data for all cluster sizes of Pd1-Pd20. The width of the curves corresponds to the relative probability density of observing a given CO stretch frequency. We find that at differential coverage, the CO stretch frequency is highly correlated with the site type, which makes the primary spectra generated mathematically difficult to utilize for deconvolution purposes.     Table S1. Experimental and DFT C-O stretch frequencies from spectra with wellcharacterized coverages and overlayers. We utilize these values to generate linear scaling factors to correct for errors in DFT-calculated frequencies. We calculate a scaling factor of 1.017 with an uncertainty of 0.0004.